Boosting-Based Ensemble Learning with Penalty Setting Profiles for Automatic Thai Unknown Word Recognition
نویسندگان
چکیده
A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructing to cope with errors obtained from preceding steps. This paper presents an application of the boosting-based ensemble learning with penalty setting profiles on automatic unknown word recognition in Thai. Treating a sequential task as a non-sequential problem requires us to rank a set of generated candidates for a potential unknown word position. Since the correct candidate might not located at the highest rank among those candidates in the set, the proposed method provides penalties, in the form of a penalty setting profile, to improper ranking in order to reconstruct the succeeding classification model. In addition a number of alternative penalty setting profiles are introduced and their performances are compared on the task of extracting unknown words from a large Thai medical text. Using the näıve Bayes as the base classifier for ensemble learning, the proposed method achieves the accuracy of 89.24%, which is an improvement of 9.91%, 7.54%, 5.25% over conventional näıve Bayes, non-ensemble version, and flat penalty setting profile.
منابع مشابه
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking e...
متن کاملInvestigations on ensemble based semi-supervised acoustic model training
Semi-supervised learning has been recognized as an effective way to improve acoustic model training in cases where sufficient transcribed data are not available. Different from most of existing approaches only using single acoustic model and focusing on how to refine it, this paper investigates the feasibility of using ensemble methods for semi-supervised acoustic modeling training. Two methods...
متن کاملEnsemble Methods for Phoneme Classiication
In this paper we investigate a number of ensemble methods for improving the performance of phoneme classiication for use in a speech recognition system. We discuss boosting and mixtures of experts, both in isolation and in combination. We present results on an isolated word database. The results show that principled ensemble methods such as boosting and mixtures provide superior performance to ...
متن کاملEnsemble Methods for Phoneme Classification
This paper investigates a number of ensemble methods for improving the performance of phoneme classification for use in a speech recognition system. Two ensemble methods are described; boosting and mixtures of experts, both in isolation and in combination. Results are presented on two speech recognition databases: an isolated word database and a large vocabulary continuous speech database. Thes...
متن کاملA Hybrid Framework for Building an Efficient Incremental Intrusion Detection System
In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...
متن کامل